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ABSTRACT 

Algorithms  for  computing  the  exact  likelihood  function  of  n succes- 
sive observation  vectors  from  an  s-variate  autoregressive  moving  average 
process  of  order  (p,  q)  are  developed.  A quasi-Newton  method  is  used 
to  maximize  the  likelihood  function  with  respect  to  the  parameters  of  the 
process.  Monte  Carlo  simulations  are  performed  to  compare  the  parameter 
estimates  obtained  by  maximizing  the  exact  likelihood  function  versus 
those  obtained  by  maximizing  various  approximate  forms  of  the  likelihood 
function. 

AMS  (MOS)  Subject  Classification:  62M10 

KeyWords:  likelihood  function,  multivariate,  autoregressive,  moving 
average  models 
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MAXIMUM  LIKELIHOOD  ESTIMATION  OF  MULTIVARIATE 


AUTOREGRESSIVE-MOVING  AVERAGE  MODELS 


M.  S.  Phadke  and  G.  Kedem 


1.  INTRODUCTION 

Let  a(t)  be  s-dimensional  white  noise  series  with  mean  zero  and  diagonal,  positive 
definite  covariance  matrix  2.  A zero-mean  s-dimensional  autoregressive-moving  average 
(ARMA)  process  of  order  (p.q),  denoted  by  Y(T)  is  defined  by 

p q 

2 <Mj)Y(T— j)  = £ 0(j)a(T-j),  (1) 

j=0  j=0 

where  £(j)’s  and  9(j)’s  are  sxs  real  matrices;  <p(0)  = I;  and  9(0)  is  a lower  triangular 
matrix  with  diagonal  elements  equal  to  unity.  Note  that  there  is  no  loss  of  generality  by  the 
stated  assumptions  on  2 , <p( 0)  and  0(0). 


<D(z)  = det  ^1  4>(j)z^ 

j=l 


■ 

6(z)  = det  ^ 0(0)  + £ 0(j)z>)  (3) 

V i=i  7 

The  stationarity  condition  requires  that  the  zeros  of  <l>(z)  = 0 lie  outside  the  unit  circle;  and 
the  canonicalness  condition  requires  that  ihc  zeros  of  0(z)  = 0 lie  outside  or  on  the  unit 
circle  (see,  e.g..  Hannan  (9|). 
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V 


Let  y(k)  be  the  sxs  matrix  of  covariance  at  lag  k. 


y(k)  = E (Y(T)  YT(-k)] 

Note  that  y(k)  = yT(-k).  Let  ¥ denote  the  vector  formed  L>y  n consecutive  observations 
Y(T),  T = 1, n: 


Yt=  [Y,(l)  Y2(l) Ys(l) Y,(n) Ys(n)] 


The  covariance  matrix  of  ¥,  denoted  by  T,  is  a symmetric,  block  Toeplitz  matrix  whose  first 
row  of  sxs  blocks  is  y(0),  y(l), y(n-l),  i.e. 


r . SBT  [y(0)  y(I) 

Fr(O)  y(  1 ) 

y(l'T  y(0) 


y(n-D] 


r(2)  y(n-l) 

yU)  y(n-2) 


y(n-l)T  y(n-2)T 


y(0) 


(4) 


Assuming  that  the  white  noise  series  a(T)  has  Gaussian  distribution,  the  log  likelihood 
function  (l.l.f.)  of  the  parameters  of  the  model  (1)  is  given  by 


HP)  - - - Inin  - !yTrlY  (5) 

2 2 

where  the  constant  term  has  been  ignored  and  P stands  for  all  the  parameters  in  4>(j)’s,  6(j)'s 
and  2. 

The  equations  obtained  by  setting  the  derivatives  of  the  l.l.f.  equal  to  zero  are  highly 
nonlinear  and  are  extremely  difficult,  if  not  impossible,  to  solve  analytically.  Hence,  to  obtain 
the  maximum  likelihood  estimates  (m.l.e.)  of  the  parameters  p,  the  l.l.f.  must  be  maximized 
numerically.  Further,  the  computation  of  the  l.l.f.  at  a given  point  p is  complicated  due  to  the 
difficulties  in  computing  T-1  and  del  T.  Therefore,  approximate  methods  of  computing  the 
l.l.f.  and  maximizing  the  same  have  been  proposed  i)  in  time  domain  by  Box  and  Jenkins  [4], 
Astrom  (3],  Wilson  [23],  and  Phadke  [16],  and  ii)  in  frequency  domain  by  Hannan  [8,9], 
Anderson  [2],  Akaike  [1]  and  Parzen  [15].  Asymptotically  the  modifications  in  l.l.f.  have 
negligible  effect.  (See,  for  example.  Box  and  Jenkins  [4]  and  Wahba  [22].)  However,  for 
finite  sample  sizes,  particularly  when  the  zeros  of  8(z)  = 0 are  near  or  on  the  unit  circle, 
these  approximate  m.l.e.  can  have  much  larger  mean  squared  errors  (m.s.c.)  than  the  exact 


The  objectives  of  this  paper  are  i)  to  propose  efficient  algorithms  for  computing  the  exact 

1.1. ff.,  and  ii)  to  perform  simulation  for  comparing  the  m.s.e.  of  exact  m.I.e.  against  the  m.s.e. 
of  approximate  m.I.e.  The  problem  of  computing  the  exact  1.1. f.  of  univariate  ARMA  models 
has  been  studied  by  Tiao  and  Ali  [20),  Shaman  [18,19],  Uppuluri  and  Carpenter  [21],  Newbold 
[13]  and  Ljung  [12].  The  case  of  multivariate  models  was  recently  investigated  by  Hillmer 
[10]. 

Since  the  main  difficulties  of  maximum  likelihood  estimation  are  due  to  the  moving 
average  part,  the  case  of  pure  moving  average  models  (p=0)  will  be  presented  first.  In 
Section  2 we  briefly  review  the  frequency  domain  and  the  time  domain  approximations  to  the 

1.1.  f.  of  moving  average  models.  In  Section  3 we  present  three  methods  for  computing  the 
exact  1.1. f.  of  moving  average  models  and  compare  the  relative  computational  efficiencies  of  the 
three  methods.  Computation  of  the  exact  1.1. f.  of  ARMA(p,q)  models  is  discussed  in  Section 
4.  A method  for  maximizing  the  1.1. f.  is  described  in  Section  5;  and  Section  6 presents  some 
simulation  results  for  moving  average  models. 

2.  APPROXIMATE  l.l.f  OF  MOVING  AVERAGE  MODELS 

For  the  q,h  order  moving  average  (MA(q))  model  the  covariance  function  is  given  by 
q-k 

^ 0(j)  2 0T(j+k),  0<k<q 

j=0 

y(k)  = yT(-k)  = (6) 

0,  k>q 

Hence,  the  corresponding  covariance  matrix  T has  block  band  structure. 

T = SBT  (y(0)  y(l)  y(q)  0 0 0]  (7) 


2.1  Frequency  Domain  Approximation 


Consider  the  modified  covariance  matrix  Tf  (ns  x ns  matrix)  defined  by 


The  matrix  Tf  differs  from  T only  in  the  top  right  corner  and  the  bottom  left  corner,  fn 
addition  to  being  symmetric,  block  Toeplitz,  the  matrix  Tj  is  also  a circulant  matrix  and 
hence  has  the  following  decompositon: 


Tf  « XhFX 

where  X is  nsxns  unitary  matrix  defined  by  X * {^jr}  w*iere  X -r  are  sxs  matrices, 

X:r  *»  — I$  exp  (2wijr/n)  0°) 

✓n 


for  j,  r = 1,  2 n;  the  superscript  H means  transpose  and  complex  conjugate;  and  F is 

nsxns  block  diagonal  matrix,  F = lFjr>.  where  Fjr  are  sxs  matrices. 


0 


j#r 


q 

^r(k)  exp  (-2wikr/n),  j**r 


(11) 


for  j , r = 1,  2,  ....  n.  Note  that  the  diagonal  blocks  Ffr  are  the  same  as  the  spectral  densities' 
of  the  moving  average  process  at  equispaced  frequencies. 


If  we  approximate  the  covariance  matrix  of  Y by  rf,  then  the  corresponding  approxi- 
mate l.l.f.  becomes 


h 


n 


js  lnf‘JetFrrl  + 
r=  1 


n 

£ w(r)HFrr-'w(r) 
r=l 


where  w(r)  is  sxl  vector. 


n 

w(r)  = 2 Y<T)  exP  ( 2v  ' Tr/n) 

T=1 


(12) 


(13) 


Note  that  w(r)  are  the  sample  Fourier  Transforms  of  the  observed  data. 

2.2  Time  Domain  Approximation 

Suppose  we  assume  that  the  innovations  a(0),  a(-l ),...,  a(-q+l)  are  all  equal  to  their 
expected  values,  i.e.,  equal  to  the  zero  vector.  Then  one  can  recursively  compute 

q 

a(T)  = 0(0)-'  ( Y(T)  - £ 0(j)a(T-j)J  (14) 

^ j=l 

for  T = 1,  2,...,  n.  Conditional  on  a(0)  = a(-l)  = ...  = a(-q+l)  = 0,  the  l.l.f.  of  P for 
given  observations  Y is  given  by 

n 

= - i jnxlnfdetS)  + ^ aT(T)2-'a(T)  | (15) 

T=  1 

Estimates  of  P obtained  by  minimizing  will  be  termed  as  the  approximate  time  domain 
m.I.e. 

Another  way  of  arriving  at  the  approximate  l.l.f.  L,  is  to  approximate  the  covariance 
matrix  of  Y by  the  following  matrix: 


r,  - L D lt 


(16) 


-5- 


(17) 


where  L is  nsxns  lower  triangular  mai.'x  defined  by 

R(0) 


0(0) 


for  j,  r = 1,  2,  ....  n;  and  D is  (nsxns)  block  diagonal  matrix  defined  by 


2 0 0 0 

0 2 0 0 

0 0 2 0 

D = 


J^O  0 0 2 

3.  EXACT  l.l.f.  OF  MOVING  AVERAGE  MODELS 


(18) 


An  efficient  algorithm  for  computing  the  l.l.f.  must  exploit  to  the  fullest  extent  the 
symmetric,  block-band,  block-Toeplitz  property  of  the  T matrix.  Further,  in  order  to  compute 
the  exact  l.l.f.  we  need  not  explicity  compute  f1;  but  it  is  enough  to  be  able  to  compute  det  T 
and  the  quadratic  form  l,Tr-lr.  In  the  following  we  propose  three  techniques  for  computing 
the  exact  l.l.f.  Each  method  utilizes  the  structure  of  T in  a different  way:  the  first  method  is 
based  on  the  Cholesky  decompostion  of  the  T matrix,  and  the  other  two  methods  are  based  on 
low  rank  corrections  to  the  approximate  l.l.f.  of  the  frequency  domain  method  and  of  the  time 
domain  method. 


3.1  Cholesky  Decomposition  Method 

Since  T is  symmetric,  positive  definite,  (2(q+l)s-l)  diagonal  band  matrix,  it  has  the 
following  Cholesky  decomposition  (sec,  e.g..  Noble  [14],  and  Conte  and  deBoor  [5]): 


r - ldlt 


(19) 


-6- 


where  L is  nsxns  lower  triangular,  (q+l)s  diagonal  band  matrix  with  the  entries  of  the 
principal  diagonal  equal  to  unity;  and  D is  nsxns  diagonal  matrix.  The  two  components  of  the 
1.1. f.  are  given  by 


yTr-'y  = (L-'y)T  d-'  (L-'y),  (20) 

n 

In  (det  T)  = In  (det  D)  = ^ In  Drr  (21) 

r=  1 


Thus,  the  1.1. f.  for  a given  value  of  the  /?  vector  can  be  computed  through  the  following 
steps; 

(i)  Compute  the  covariances  y(k)  for  q. 

(ii)  Compute  the  factors  L and  D by  a triangularization  algorithm  for  band  matrices. 

(iii)  Compute  L_,y  = u by  a back  substitution  algorithm  for  band  matrices. 

(iv)  Compute  the  weighted  inner  product  uTD“'u  and  the  In  (det  D). 

The  following  considerations  are  important  for  reducing  the  needed  storage  space  and 
computing  time. 

1)  It  is  not  necessary  to  store  the  entire  matrix  T.  The  desired  entries  of  T can  be  easily 

obtained  from  the  stored  values  of  y(0) y(q). 

2)  Steps  (ii),  (iii)  and  (iv)  need  not  be  carried  out  sequentially.  By  interfacing  (iii)  and  (iv) 
with  (ii),  one  needs  only  to  store  current  (q+l)s2/2  entries  of  L and  (q+l)s  entries  of 
D and  u. 

3)  The  fact  that  T has  block  band  structure  implies  that  T is  a band  matrix  of  variable 
width.  This  fact  should  be  utilized  to  reduce  the  computation  time  of  steps  (ii)  and  (iii). 
For  small  q,  the  savings  thus  achieved  can  be  appreciable. 

3.2  Frequency  Domain  Method 

Here  we  use  the  fact  that  T-rf  is  a sparse,  low  rank  (at  most  2qs)  matrix.  The 
following  identity  can  be  easily  verified; 

r - rf  + ugcctut 


(22) 


where  C,  G and  U are  of  dimensions  2qsx2qs,  2qsx2qs  and  nsx2qs  respectively  and 
arc  defined  by: 


where 

"r(q)T  y(q-l)T  y(l)T 

0 y(q)T  r(  2)T 

Gt  = 


[O  0 y(q)T  | 

The  matrix  C is  always  nonsingular  and  Tf  is  nonsingular,  except  possibly  in  the  case 
the  roots  of  0(z)  = 0 have  unit  modulus.  Excluding  the  cases  when  Tf  is  singular,  by 
Woodbury’s  formula  (see,  e.g..  Noble  [14]  and  Householder  [11]),  we  obtain: 

I**1  = ^f-|-^f-,uc[-c+^7u^f-,uTc:T^,^;TuT^f-,  (23) 

The  desired  quadratic  form  in  Y is  then  given  by 

* yTxHF-’xy-BT(7[-c+cTRC  ]-'ctb  (24) 

where  B * UTXf,F~,XY  is  2qsx  1 real  vector  and  R = UTXHF_,XU  is  2qsx2qs  real 
symmetric  matrix.  Note  that  the  first  term  on  the  right  hand  side  of  cq.  (24)  is  the  same  as 
the  approximate  quadratic  form  of  the  frequency  domain  method,  and  we  shall  call  the  second 
term  the  ‘low  rank  correction  term’  for  obvious  reasons.  By  Woodbury's  formula,  we  also 
have  (see  Noble  [14])  the  following  convenient  expression  for  computing  det  P. 


del  r . (det  rf)(det  C)(det  [-C+GTRC]) 


(25) 


wrz-”'- 


v l 


& : 


We  now  propose  the  following  algorithm  for  computing  the  l.l.f.  for  a given  value  of  the  /? 
vector: 


i)  Compute  the  discrete  Fourier  transform,  w(r),  r = 1,  2 n of  the  observed  data 

Y(T)  by  eq.  (13). 


ii)  Compute  the  covariances  y(k),  k = 1, q by  Eq.  (6). 


iii)  Compute  the  spectral  density  matrices  Ffr,  r = 1,2 n by  eq.  (9). 

iv)  Compute  v(r)  = F “*  w(r),  r = 1,2 n. 


v)  Compute 


L 2e»P  (-~)v<D.  ISi<q 


r=  1 


b(j) 


(26) 


1_ 

✓n 


n 

S exP 

r=  1 


/2wir(n+j-2q)  \ , . _ 

^ j v(r),  q+1  <j<2q 


Note  that  b(j)’s  are  sxl  real  vectors.  The  vector  • is  given  by 
BT  = [bT(l)  bT(2) bT(2q)] 


vi)  Computation  of  R:  Let  R be  partitioned  into  sxs 
submatrices  Rj  k,  R = {Rj  kJ,  for  j,  k - 1,  2, 

2q.  R is  a real  symmetric  matrix  and  its  entries  in  the  upper  triangle  are  given  by 


R. 


j.k 


1 2 „p  q 


r=  1 


1 ^ /2irir(n+k-2q-j)  \ , 

. ^ ««p  (— ) 


Frr"'  for  j=  1 q;  k=q+ 1 2q  (27) 


r=  1 


1 v /«:irir(k— j)  \ 

- x «»p  (— — ) 


wir(k-j)  \F  _| 

^rr 


for  j,k=q+ 1 2q 


r=  1 


-9- 


A careful  examination  of  the  above  formula  reveals  that  one  need  only  compute  R,  k 

for  k = 1,  2 q and  R,  q+ , for  j = I,  2 q.  The  remaining  block  entries 

R . are  equal  to  these  entries  or  their  transposes. 

J.K 

N 

vii)  Compute  the  approximate  quadratic  form 

n 

Qj  = ^ w(r)Hv(r)  and  det  T{. 

r=  1 

viii)  Complete  the  computation  of  1.1. f.  by  appropriate  substitutions  in  eqs.  (24),  (25)  and 
(5). 


Considerable  saving  in  storage  space  can  be  achieved  by  performing  steps  iii)  through  viii) 
in  a loop  on  r and  accumulating  the  various  quantities  as  one  goes  along.  Further,  one  need 
only  go  through  half  the  loop  on  r since  the  quantities  associated  with  r =j  and  r = n-j 
are  complex  conjugates.  The  cases  of  even  and  odd  n will  have  to  be  treated  slightly 
differently. 


3.3  Time  Domain  Method 

The  low  rank  correction  method  in  time  domain  is  based  on  the  fact  that  T - T,  is  a 
sparse,  low  rank  (at  most  qs)  matrix.  We  can  easily  verify  that 


r = rt  + u cccTuT 


(28) 


where  the  matrices  C,  £ and  U are  of  dimensions  qsxqs,  qsxqs,  nsxqs,  and  they  are 
defined  by 


8(q)  fl(q-l) 

0 8(q)  . 


Q - 


8(1) 

8(2) 


0 


0 


8(q) 


UT-  ( i ^ o o 


01 


>^>W 


I ■ — ™... 


Since  both  C and  Tt  are  nonsingular,  by  Woodbury’s  formula,  we  have: 

i“'  = r -r,  -'u  q [ c-1  + gt  uT  r u g r1  gt  uT  r-'  (29) 

Thus,  the  desired  quadratic  form  in  Y is  given  by 

yTr~,y=rTL-TD-1  l-'y  - s.tg  [ c-'  + ctrc  r1  gt  b (30) 

where  B = HTD_,L_ly  isqsxl  vector,  R = HD_,Ht  is  qsxqs  symmetric  matrix  and  H 
= L*1  U is  nsxns  matrix.  Thus  the  desired  quadratic  form  is  seen  to  consist  of  the  approxi- 
mate time  domain  quadratic  form  and  a low  rank  correction  term.  By  Woodbury’s  formula,  we 
also  have  a convenient  formula  for  computing  det  T 

det  T = (det  T,)  (det  C)  (det  [C"1  CT  R G])  (31) 

We  now  have  the  following  algorithm  for  computing  the  1.1. f.  for  a given  value  of  the  p 
vector. 

i)  Compute  a(T)  for  T=l,2 n by  eq.  (12)  by  assuming  a(0)  = a(-l)=  ...  = 

a(-q+ 1 ) = 0 

ii)  The  approximate  quadratic  form  is  given  by: 

n 

]£  aT(T)  2"'  a(T) 

T=1 

iii)  Let  H be  pardoned  into  sxs  submatrices  Hrk,  H = {Hfk}  for  r = 1,  2,  ...,  n and  k = 
1,  2,  ...,  q.  These  submatrices  can  be  computed  by  the  following  equations. 


HU 

= 0(0)-' 

min(q,r) 

Hr,l 

= 0(0 )-' 

(2 

fl(j)  Hr_j,  i J forr=2 

J=1 

I ° 

if  r<j  and  j=2,...,q 

Hr,j 

= 

( «, 

-i+U 

if  r>j  and  j=2 q 

iv)  Compute 


I 


p> 


n 

b(j)  = Y,  HTr>j  2-'  a(r)  forj=l q. 

r=  1 

Note  that  b (j)’s  are  sx  1 vectors.  The  vector  g is  given  by: 
g - [b(l)T  b(2)T  ...  b(q)T] 


v)  Computation  of  R:  Let  R be  partitioned  into  sxs  submatrices  R ■ k R = {Rj  k]  for 

j.  It  = 1,  ....  q.  R is  symmetric  matrix  and  its  entries  in  the  upper  triangle  are  given  by: 

n-k-1 

Si.k  - 2 HT„m  . 1 Ht,, 

r=  1 

w Note  that  while  computing  R ( j one  get  Rj  ■ j = 2,  ....  q as  intermediate  results  for, 

j>2.  Similarly,  while  computing  Rj  2 R,  3 , ....  R,  one  gets  the  remaining  entiries  as 

intermediate  results. 

vi)  Use  Eqs  (30),  (31)  and  (5)  to  complete  the  computation  of  the  l.l.f. 

Similar  to  the  frequency  domain  method,  considerable  savings  in  storage  space  may  be 
achieved  by  performing  the  above  steps  in  a loop  on  r. 


3.4  Coaparison  of  the  Three  Methods 

The  three  methods  of  computing  the  exact  l.l.f.  utilize  the  structure  of  T in  different 
ways  and  hence  may  be  expected  to  have  different  computational  efficiencies.  To  facilitate  the 
comparison  of  the  computational  effort  involved  in  each  method,  we  give  the  following 
formulas  for  operation  counts  (OC): 

Cholesky  Decompostion  Method 

OC  «=  n lqV/2  + qs2(s+4)/2  + s(s+3)(s+5)/8]  (32) 

Frequency  Domain  Method 

OC  - n [q(3s)(s+l)+s«8/3)s2  + (3/2))]  (33) 


: 

i 
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Time  Domain  Method 


t 


i 


OC  = n[(qs2(2s+3)  + s(s2  + 3)/2]  (34) 

where  by  an  operation  we  mean  a multiplication  or  division  of  two  real  numbers.  Thus 
multiplication  of  two  complex  numbers  is  equivalent  to  four  operations.  In  deriving  the  above 
formula  terms  that  do  not  depend  on  n have  been  ignored. 

A comparison  of  the  operation  counts  given  by  the  above  equations  indicates  the  follow- 
ing particular  choices: 

i)  s = 1:  Cholesky  method  has  the  smallest  OC  for  q<4  and  the  time  domain  method 
has  the  smallest  OC  forq>5 

ii)  s = 2:  Cholesky  method  is  the  best  for  q<3  while  the  frequency  domain  method  is 
the  best  for  q>4 

iii)  s = 3,  s = 4:  Cholesky  method  is  the  best  for  q<2  and  the  frequency  domain 

method  has  the  smallest  OC  for  q>3. 

4.  ESTIMATION  OF  ARMA  (p,q)  MODELS 

Consider  the  ARMA  (p,q)  process  as  being  defined  by  Eq.(l)  and  Z(T)  as  being 
defined  by: 


P 

Z(T)  = 2 *(j)Y(T-j)  (35) 

j=0 


Then  Z(T)  is  a MA(q)  process.  We  have 

p (Y(p+ 1),  Y(p+2) Y(n);  Y(l) Y(p),  0) 

-p(Z(p+l) Z(n);  Y(l) Y(p),  0)  (36) 

since  the  Jacobian  of  the  linear  transformation  from  Z(T)’s  to  Y(T)’s  is  equal  to  unity. 

Thus,  the  l.l.f.  of  Y(p+1),  ....  Y(n)  with  the  first  observations  Y(l) Y(p)  considered 

fixed,  can  be  computed  by  one  of  the  three  methods  described  in  the  previous  section.  For 
obvious  reasons,  the  efficiency  of  the  parameter  estimates  obtained  by  maximizing  the  above 
conditional  likelihood  is  at  least  (n-p)/n.  Thus,  for  most  practical  situation  where  n >>  p, 
these  conditional  estimates  may  prove  to  be  satisfactory. 

The  exact  likelihood  function  may  be  computed  using  the  Bayes's  theorem: 
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p(Y;0)  = p (Y(l) Y(p);  0) 

x p (Y(p+ 1) Y(n);  Y(l) Y(p),  0)  (37) 

Computation  of  p(Y(l) Y(p);  0)  involves  computing  the  covariance  matrices  y(0), 

y(p-l)  of  the  ARMA(p.q)  process  which  is  quite  complicated.  So  for  most  practical 

situations  where  n >>  p.  one  may  elect  to  maximize  the  conditional  likelihood  function  of 
Eq.  (36)  instead  of  the  exact  likelihood  function  given  by  Eq.  (37). 

5.  OPTIMIZATION  METHOD 

A quasi-Newton  method  due  to  Gill  and  Murray  [7]  for  unconstrained  optimization  of 
general  nonlinear  functions  may  be  used  to  maximize  the  1.1. f.  The  algorithm  computes  the 
needed  derivatives  of  the  objective  function  internally  by  finite  difference  approximation  and 
updates  the  Hessian  by  Daviden-Fletcher-Powell  correction  formula  (6].  The  convergence  rate 
of  the  quasi-Newton  method  is  superlinear. 

Two  Stage  Optimization:  Since  computation  of  the  exact  l.l.f  is  much  more  expensive 

than  computation  of  either  of  the  two  approximations  to  the  l.l.f.,  it  is  advisable  to  first 
maximize  an  approximate  l.l.f.  and  then  use  the  resulting  estimates  and  the  Hessian  as  starting 
values  for  maximizing  the  exact  l.l.f. 

Maximization  of  the  approximate  l.l.f:  For  maximization  of  the  approximate  l.l.f.  one 
may  take  the  starting  values  to  be  ‘no  prior  information’  values  viz.  <j>(l)  = <<>(2)  = ...  = $(p) 

= 0,  0(0)  = I,  0(j)  =0  for  j = r q,  and  the  diagonal  entries  2. ■ of  2 equal  to  the 

variances  of  the  respective  Yj(T)  series.  Or  one  may  use  the  procedures  described  by  Box 
and  Jenkins  [4 ] for  univariable  time  series  and  by  Hannan  (8,9]  for  multivariate  cases.  Our 
maximization  of  the  approximate  l.l.f.  by  the  quasi-Newton  method  with  ‘no  prior  information’ 
starting  values  has  been  very  encouraging. 

Positive  definiteness  of  0:  While  maximizing  the  approximate  or  the  exact  l.l.f.,  one  must 
ensure  that  0 remains  positive  definite.  By  our  formulation  of  the  ARMA  models  with  0(0) 
as  lower  triangular  matrix  and  0 as  diagonal,  the  check  for  positive  definiteness  becomes  very 
simple.  Further  in  the  numerous  simulations  we  performed,  it  was  observed  that  requiring  the 
diagonal  entries  of  2 to  be  positive  was  redundant,  since  the  objective  function  attaches  as 
natural  penalty  against  excusions  into  nonpositive  values  of  2. 

Canonicalness  of  the  optimum  point:  Performing  constrained  optimization  to  obtain 
canonical  values  of  6 parameters  is  an  impractical  task.  Thus,  one  may  perform  an  uncon- 
strained optimization,  check  for  canonicalncss  and  if  necessary  perform  the  transformations 
developed  by  Rozanov  (17]  and  illustrated  by  Phadke  (16]  for  obtaining  equivalent  canonical 
model. 
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6.  SIMULATION  RESULTS 


Monte  Carlo  simulations  were  performed  to  compute  the  mean  squared  errors  (m.s.e.)  of 
parameter  estimates  obtained  by  maximizing  the  two  approximate  forms  of  the  l.l.L.  and  those 
obtained  by  maximizing  the  exact  l.l.f.  Table  1 gives  the  m.s.e.  for  the  parameters  of  the 
univariate  MA(1)  model  obtained  by  simulating  100  time  series  of  100  observations  each 
with  9(1 ) = .95  and  a a2  = 1.  Table  2 gives  similar  simulation  results  for  the  univariate 
MA(2)  model  with  9(1)  = 0,  9(2 ) **  .95  and  o*  = 1;  and  Table  3 for  the  bivariate  MA(1) 
model  with 

9(0)  = 


1 0 

’.98  1.96" 

‘l  0 

,9(1)  = 

2 « 

1 1 

x> 

OO 

1 

0 1 

From  Tables  1,  2 and  3 we  observe  that  the  time  domain  approximate  method  gives 
considerably  smaller  m.s.e.  than  the  corresponding  m.s.e.  for  the  frequency  domain  approxi- 
mate method,  while  the  exact  likelihood  method  gives  much  smaller  m.s.e.  than  either  of  the 
two  approximate  methods. 

Simulations  were  also  performed  for  the  cases  when  all  the  roots  of  det  6(z)  = 0 lie 
away  from  the  unit  circle.  As  expected,  it  was  found  that  the  two  approximate  methods  and 
the  exact  method  gave  similar  m.s.e. 

7.  CONCLUSIONS 

i)  Three  algorithms  are  proposed  for  computing  the  exact  l.l.f.  of  the  multivariate  moving 
average  processes.  Formulas  for  operation  counts  of  the  algorithms  are  given  as  a guide 
in  selecting  the  best  application  for  a given  problem.  It  is  then  shown  how  any  one  of 
these  algoirithms  can  be  modified  for  computing  the  exact  l.l.f.  of  multivariate  autore- 
gressive moving  average  processes. 

ii)  Monte  Carlo  simulations  are  performed  to  compare  the  m.s.e.  of  estimates  from  the  time 
domain  and  frequency  domain  approximate  methods  and  the  exact  likelihood  method. 
The  simulations  show  that  i)  the  exact  likelihood  method  has  the  smallest  m.s.e.  and  ii) 
the  time  domain  approximation  gives  smaller  m.s.e.  than  the  frequency  domain  approxi- 
mation. 
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Table  I Mean  Squared  Errors  of  Estimates  for  the  Univariate  MA(1)  Model:  Results 
of  100  simulations  with  0(1)  = -95,  oa2  = 1 and  n = 100. 


Time  Domain 
Approx. 

0(1)  .00695 

a 2 .00671 


Freq.  Domain  Exact 

Approx. 

.01472  .00180 

.01481  .00107 


Table  2.  Mean  Squared  Errors  of  Estimates  for  the  Univariate  MA(2)  Model:  Results 
of  100  simulations  with  0(1)  = 0,  0(2)  = .95,  oa2  = 1 and  n = 100. 


Time  Domain 
Approx. 

Freq.  Domain 
Approx. 

Exact 

0(1) 

.00360 

.00468 

.00241 

0(2) 

.00916 

.02442 

.00221 

.01012 

.02641 

.00191 

" ■ 1 


W^m — m ' 
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Table  3.  Mean  Squared  Errors  of  Estimates  for  the  Bivariate  MA(1)  Model:  Results  of 
100  simulations  with 


Time  Domain 
Approx. 

Freq.  Domain 
Approx. 

Exact 

021<O) 

.01544 

.01799 

.00970 

.05580 

.06883 

.03940 

•2.(1) 

.00917 

.01034 

.01029 

•n0> 

.00893 

.01408 

.00145 

*22(i) 

.00696 

.01106 

.00066 

*,2 

.05298 

.08557 

.00251 

°22 

.00801 

.01404 

.00266 
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